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Abstract 

Phrase-structure grammars are effective models for important syntac- 
tic and semantic aspects of natural languages, but can be computationally 
too demanding for use as language models in real-time speech recogni- 
tion. Therefore, finite-state models are used instead, even though they 
lack expressive power. To reconcile those two alternatives, we designed 
an algorithm to compute finite-state approximations of context-free gram- 
mars and context-free-equivalent augmented phrase-structure grammars. 
The approximation is exact for certain context-free grammars generating 
regular languages, including all left-linear and right-linear context-free 
grammars. The algorithm has been used to build finite-state language 
models for limited-domain speech recognition tasks. 

1 Motivation 

Grammars for spoken language systems are subject to the conflicting require- 
ments of language modeling for recognition and of language analysis for sen- 
tence interpretation. For efficiency reasons, most current recognition systems 
rely on finite-state language models. These models, however, are inadequate for 
language interpretation, since they cannot express the relevant syntactic and se- 
mantic regularities. Augmented phrase structure grammar (APSG) formalisms, 
such as unification grammars p5[ , can express many of those regularities, but 
they are computationally less suitable for language modeling because of the 
inherent cost of computing state transitions in APSG parsers. 

The above conflict can be alleviated by using separate grammars for language 
modeling and language interpretation. Ideally, the recognition grammar should 
not reject sentences acceptable by the interpretation grammar and as far as 
possible it should enforce the constraints built into the interpretation grammar. 
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However, if the two grammars are built independently, those goals are difficult 
to maintain. For that reason, we have developed a method for approximating 
APSGs with finite-state acceptors (FSAs). Since such an approximation is in- 
tended to serve as language model for a speech-recognition front-end to the real 
parser, we require it to be sound in the sense that the approximation accepts all 
strings in the language defined by the APSG. Without qualification, the term 
"approximation" will always mean here "sound approximation." 

If no further requirements were placed on the closeness of the approximation, 
the trivial algorithm that assigns to any APSG over the alphabet £ the regular 
language E* would do, but of course this language model is useless. One possible 
criterion for "goodness" of approximation arises from the observation that many 
interesting phrase-structure grammars have substantial parts that accept regular 
languages. That does not mean that grammar rules are in the standard forms 
for defining regular languages (left-linear or right-linear), because syntactic and 
semantic considerations often require that strings in a regular set be assigned 
structural descriptions not definable by left- or right-linear rules. An ideal 
criterion would thus be that if a grammar generates a regular language, the 
approximation algorithm yields an acceptor for that regular language. In other 
words, one would like the algorithm to be exact for all APSGs yielding regular 
languages. However, we will see later that no such general algorithm is possible, 
that is, any approximation algorithm will be inexact for some APSGs yielding 
regular languages. Nevertheless, we will show that our method is exact for left- 
linear and right-linear grammars, and for certain useful combinations thereof. 

2 The Approximation Method 

Our approximation method applies to any context-free grammar (CFG), or any 
constraint-based grammar |l5|, ^| that can be fully expanded into a context-free 
grammar.^ The resulting FSA accepts all the sentences accepted by the input 
grammar, and possibly some non-sentences as well. 

The implementation takes as input unification grammars of a restricted form 
ensuring that each feature ranges over a finite set. Clearly, such grammars can 
only generate context-free languages, since an equivalent CFG can be obtained 
by instantiating features in rules in all possible ways. 

2.1 The Basic Algorithm 

The heart of our approximation method is an algorithm to convert the LR(0) 
characteristic machine M.(G) (5[ || of a CFG G into an FSA for a superset of 
the language L(G) defined by G. The characteristic machine for a CFG G is an 

1 Unification grammars not in this class must first be weakened using techniques such as 
Shieber's restrictor pa] . 
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FSA for the viable prefixes of G, which are just the possible stacks built by the 
standard shift-reduce recognizer for G when recognizing strings in L(G). 

This is not the place to review the characteristic machine construction in 
detail. However, to explain the approximation algorithm we will need to recall 
the main aspects of the construction. The states of Ai(G) are sets of dotted 
rules A — ► a ■ /3 where A — > a(3 is some rule of G. A4(G) is the dcterminization 
by the standard subset construction [|| of the FSA defined as follows: 

• The initial state is the dotted rule S' — > -S where S is the start symbol of 
G and S' is a new auxiliary start symbol. 

• The final state is S' -> S-. 

• The other states are all the possible dotted rules of G. 

• There is a transition labeled X, where X is a terminal or nonterminal 
symbol, from A — > a ■ X(3 to A — * aX ■ (3. 

• There is an e-transition from A — > a ■ Bf3 to B — > -7, where B is a 
nonterminal symbol and B — > 7 is a rule in G. 

M(G) can be seen as the finite state control for a nondeterministic shift- 
reduce pushdown recognizer 7Z(G) for G. A state transition labeled by a termi- 
nal symbol x from state s to state s' licenses a shift move, pushing onto the stack 
of the recognizer the pair (s, x). Arrival at a state containing a completed dotted 
rule A — > a- licenses a reduction move. This pops from the stack |a| elements, 
checking that the symbols in the pairs match the corresponding elements of a, 
takes the transition labeled by A from the state s in the last pair popped, and 
pushes (s, A) onto the stack. (Full definitions of those concepts are given in 
Section g.) 

The basic ingredient of our approximation algorithm is the flattening of a 
shift-reduce recognizer for a grammar G into an FSA by eliminating the stack 
and turning reduce moves into e-transitions. It will be seen below that flat- 
tening 1Z(G) directly leads to poor approximations in many interesting cases. 
Instead, M.[G) must first be unfolded into a larger machine whose states carry 
information about the possible shift-reduce stacks of 1Z(G). The quality of the 
approximation is crucially influenced by how much stack information is encoded 
in the states of the unfolded machine: too little leads to coarse approximations, 
while too much leads to redundant automata needing very expensive optimiza- 
tion. 

The algorithm is best understood with a simple example. Consider the left- 
linear grammar G\ 

S^Ab 
A — > Aa I e 

A4(Gi) is shown on Figure |l|. Unfolding is not required for this simple example, 
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Figure 1: Characteristic Machine for G\ 




Figure 2: Flattened Canonical Acceptor for L(G\) 

so the approximating FSA is obtained from A4(Gi) by the flattening method 
outlined above. The reducing states in M(Gi), those containing completed 
dotted rules, are states 0, 3 and 4. For instance, the reduction at state 3 would 
lead to a TZ(G\) transition on nonterminal S to state 1, from the state that 
activated the rule being reduced. Thus the corresponding e-transition goes from 
state 3 to state 1. Adding all the transitions that arise in this way we obtain the 
FSA in Figure]^. From this point on, the arcs labeled with nonterminals can be 
deleted, and after simplification we obtain the deterministic finite automaton 
(DFA) in Figure |, which is the minimal DFA for L(G\). 

If flattening were always applied to the LR(0) characteristic machine as in 
the example above, even simple grammars defining regular languages might be 



Figure 3: Minimal Acceptor for L(G\) 
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Figure 4: Minimal Acceptor for L{G>i) 




Figure 5: Flattened Acceptor for L{G2) 



inexactly approximated by the algorithm. The reason for this is that in general 
the reduction at a given reducing state in the characteristic machine transfers 
to different states depending on stack contents. In other words, the reducing 
state might be reached by different routes which use the result of the reduction 
in different ways. The following grammar G2 

S aXa I bXb 
X c 

accepts just the two strings aca and bcb, and has the characteristic machine 
M (G2 ) shown in Figure ||. However, the corresponding flattened acceptor shown 
in Figure ^| also accepts acb and bca, because the e-transitions leaving state 5 
do not distinguish between the different ways of reaching that state encoded in 
the stack of 1Z{G2)- 

Our solution for the problem just described is to unfold each state of the 
characteristic machine into a set of states corresponding to different stacks at 
that state, and flattening the corresponding recognizer rather than the orig- 
inal one. Figure ^ shows the resulting acceptor for L(G2), now exact, after 
dctcrminization and minimization. 




Figure 6: Exact Acceptor for L{G2) 
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In general the set of possible stacks at a state is infinite. Therefore, it 
is necessary to do the unfolding not with respect to stacks, but with respect 
to a finite partition of the set of stacks possible at the state, induced by an 
appropriate equivalence relation. The relation we use currently makes two stacks 
equivalent if they can be made identical by collapsing loops, that is, removing 
in a canonical way portions of stack pushed between two arrivals at the same 
state in the finite-state control of the shift-reduce recognizer, as described more 
formally and the end of section 3.1. The purpose of collapsing a loop is to 
"forget" a stack segment that may be arbitrarily repeated.^] Each equivalence 
class is uniquely defined by the shortest stack in the class, and the classes can be 
constructed without having to consider all the (infinitely) many possible stacks. 



2.2 Grammar Decomposition 

Finite-state approximations computed by the basic algorithm may be extremely 
large, and their determinization, which is required by minimization [EJ, can be 
computationally infeasible. These problems can be alleviated by first decom- 
posing the grammar to be approximated into subgrammars and approximating 
the subgrammars separately before combining the results. 

Each subgrammar in the decomposition of a grammar G corresponds to 
a set of nonterminals that are involved, directly or indirectly, in each other's 
definition, together with their defining rules. More precisely, we define a directed 
graph conn(G) whose nodes are G's nonterminal symbols, and which has an arc 
from X to Y whenever Y appears in the right-hand side of one of G's rules and 
X in the left-hand side. Each strongly connected component of this graph |ij 
corresponds to a set of mutually recursive nonterminals. 

Each nonterminal X of G is in exactly one strongly connected component 
comp(X) of conn(G). Let prod(X) be the set of G rules with left-hand sides in 
comp(X), and rhs(X) be the set of right-hand side nonterminals of comp(X). 
Then the defining subgrammar dci(X) of X is the grammar with start symbol 
X, nonterminal symbols comp(X), terminal symbols £ U (rhs(X) — comp(X)) 
and rules prod(X). In other words, the nonterminal symbols not in comp(X) 
are treated as pseudoterminal symbols in def(X). 

Each grammar def(X) can be approximated with our basic algorithm, yield- 
ing an FSA aut(X). To see how to merge together each of these subgrammar 
approximations to yield an approximation for the whole of G, we observe first 
that the notion of strongly connected component allows us to take each aut(X) 
as a node in a directed acyclic graph with an arc from aut(X) to aut(X') when- 
ever X 1 is a pseudoterminal of def(X). We can then replace each occurrence of 
a pseudoterminal X' by its definition. More precisely, each transition labeled 
by a pseudoterminal X' from some state s to state s' in aut(X) is replaced 

2 Since possible stacks can be shown to form a regular language, loop collapsing has a direct 
connection to the pumping lemma for regular languages. 
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by e-transitions from s to the initial state of a separate copy of aut(X') and 
e-transitions from the final states of the copy of aut(X') to s'. This process is 
then recursively applied to each of the newly created instances of aut(X') for 
each pseudoterminal in def(X). Since the subautomata dependency graph is 
acyclic, the replacement process must terminate. 

3 Formal Properties 

We will show now that the basic approximation algorithm described informally 
in the previous section is sound for arbitrary CFGs and is exact for left-linear 
and right-linear CFGs. From those results, it will be easy to see that the ex- 
tended algorithm based on decomposing the input grammar into strongly con- 
nected components is also sound, and is exact for CFGs in which every strongly 
connected component is either left linear or right linear. 

In what follows, G is a fixed CFG with terminal vocabulary S, nonterminal 
vocabulary TV and start symbol S, and V = £ U N. 

3.1 Soundness 

Let M be the characteristic machine for G, with state set Q, start state s , final 
states F, and transition function S : S x V — > S. As usual, transition functions 
such as 5 are extended from input symbols to input strings by defining S(s,e) = s 
and S(s, a(i) = S(S(s, a), (3). The shift-reduce recognizer 1Z associated to M has 
the same states, start state and final states as M. Its configurations arc triples 
(s, a, w) of a state, a stack and an input string. The stack is a sequence of pairs 
(s, X) of a state and a symbol. The transitions of the shift-reduce recognizer 
are given as follows: 

Shift: (s, (j, xw) h (s' , a(s, x),w) if 8(s, x) = s' 

Reduce: (s,ar,w) h (S(s', A), cr(s', A), w) if either (1) A — > • is a completed 
dotted rule in s, s' = s and t is empty, or (2) A — > X\...X n - is a 
completed dotted rule in s, r = (si, X\) ■ ■ ■ (s n , X n ) and s' = Si- 

The initial configurations of 1Z are (s 0l e,w) for some input string w, and the 
final configurations are (s, (s ,S),e) for some state s € F. A derivation of a 
string w is a sequence of configurations c , . . . , c m such that Co = (so,e,w), c m 
is final, and Cj_i h Ci for 1 < i < n. 

Let s be a state. We define the set Stacks(s) to contain every sequence 
{qo, X ) . . . {qt, Xk) such that q = s and qi = 5(<fc-i, -Xj-i), 1 < i < k and 
s = S(qk,Xk). In addition, Stacks(so) contains the empty sequence e. By 
construction, it is clear that if (s, a, w) is reachable from an initial configuration 
in 1Z, then a £ Stacks(s). 

A stack congruence on 1Z is a family of equivalence relations = s on Stacks(s) 
for each state s € S such that if a = s a' and 6(s,X) — s' then a{s,X) = s i 
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a(s,X). A stack congruence = partitions each set Stacks(s) into equivalence 
classes [a] s of the stacks in Stacks(s) equivalent to a under = s . 

Each stack congruence = on TZ induces a corresponding unfolded recognizer 
TZ=. The states of the unfolded recognizer are pairs (s, [cr] s ), notated more 
concisely as [cr] s , of a state and stack equivalence class at that state. The initial 
state is [e] s °, and the final states are all [a] s with s £ F and a £ Stacks(s). The 
transition function <5= of the unfolded recognizer is defined by 

That this is well-defined follows immediately from the definition of stack con- 
gruence. 

The definitions of dotted rules in states, configurations, shift and reduce 
transitions given above carry over immediately to unfolded recognizers. Also, 
the characteristic recognizer can also be seen as an unfolded recognizer for the 
trivial coarsest congruence. 

Unfolding a characteristic recognizer does not change the language accepted: 

Proposition 1 Let G be a CFG, TZ its characteristic recognizer with transition 
function 5, and = a stack congruence on TZ. Then TZ= and TZ are equivalent. 

Proof: We show first that any string w accepted by TZ= is accepted by TZ. 
Let d be configuration of TZ=. By construction, d = ([p] s ,cr, u), with a = 
(((7o> eo), Xq) ■ ■ ■ {(qk, ek), X^) for appropriate stack equivalence classes e^. We 
define d = (s, er, u), with a = (q n , X ) ■ ■ ■ (qk, X}~). If do, ... , d m is a derivation 
of w in 7Z= , it is easy to verify that do, ... , d m is a derivation of w in 7?.. 

Conversely, let w G L(G), and let c , . . . ,c m be a derivation of w in TZ, 
with Ci = (si,ai,Ui). We define Ci — ([cri] Si , Oi, uf), where e = e and a(s,X) = 
a([aY,X). 

If Ci-i h Ci is a shift move, then Ui-i — xui and 5(si-\, x) = Si. Therefore, 
Mfo-ip- 1 ,*) - h_! < S< _i, a;)] 4 ^- 1 ^) 

= N Sl • 

Furthermore, 

CT, = 0-i-l(Si-i,x) = CTi-l([(Ti-l] S ^ 1 ,X) 

Thus we have 

Ci-i = ([(Ji-iY 1 - 1 ,di-i,xui) 

Ci = ([o-tY^a^do-^] 3 *- 1 ,x),Ui) 

with (^([(Ti-i]^- 1 , x) = [c~iY i ■ Thus, by definition of shift move, Ci_i h Cj in 
TZ=. 
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Assume now that Cj_i h Cj is a reduce move in 1Z. Then Uj = Ui-i and we 
have a state s in 7£, a symbol AgJV, a stack cr and a sequence r of state-symbol 
pairs such that 

Si = S(s,A) 

Oi-\ = (TT 

ni = a{s,A) 

and either 

(a) A — > • is in Sj_i, s = Sj_i and t = e, or 

(b) A — > Xi • • -X n - is in s;_i , t = • • • (q n ,X n ) and s = qi. 
Let s = [cr] s . Then 

5={s,A) = [a(s,A)] s ^ 

= n s - 

We now define a pair sequence f to play the same role in 1Z= as r does in 
7£. In case (a) above, f = e. Otherwise, let n = e and Tj = T i _i(g i _i,X_i) for 
2 < i < n, and define f by 

f - ([a]" 1 , X,) ■ ■ ■ {[anl* , X) • • • (K]«», X„) 

Then 

CT-j-l = oT 

= cr(qi,Xi) • • • (q„_i,X„_i)([crT„] 9 ",X„) 

= (7(5!, Xi) • • • fe-!, AViXI^p,^) • • • ([<rr n ]««,X n ) 
= of 
Oi = a(s,A) 

= °(i*r,A) 

= o{s,A) 

Thus 

Ci = ((5=(s, A),a(s, 

c,-i = ([o- i _i] Si - 1 ,a-f,ti i _i) 

which by construction of f immediately entails that Ci-\ b Cj is a reduce move 
in K=. □ 
For any unfolded state p, let Pop(p) be the set of states reachable from p by 
a reduce transition. More precisely, Pop(p) contains any state p' such that there 
is a completed dotted rule A — > a- in p and a state p" containing A — > -a such 
that &=(p", a) = p and <5=(p", 4) = p'. Then the flattening T= of 72.= is an NFA 
with the same state set, start state and final states as 1Z= and nondeterministic 
transition function <p= defined as follows: 
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• If 5=(p, x) = p' for some x G S, then p' G 4>=(p, x) 

• If p' G Pop(p) then p' G ^=(p, e)- 

Let c , . . . , c m be a derivation of string w in 1Z, and put q = (q i7 cr i7 Wi), and 
Pi = [ci] Pi - By construction, if Cj_! h is a shift move on x = xwi), 

then <5=(pi_i,a;) = p it and thus G (fi=(pi-i,x). Alternatively, assume the 
transition is a reduce move associated to the completed dotted rule A — > a-. 
We consider first the case a ^ e. Put a = X\ . . . X n . By definition of reduce 
move, there is a sequence of states r\, . . . ,r n and a stack a such that <7j_i = 
(j{r\,X\) . . . (r n ,X n ), <7i = a{r\,A), n contains A — > -a, 5(r\,A) = q i: and 
S(rj,Xj) = rj+i for 1 < j < n. By definition of stack congruence, we will then 
have 

5^([ar j p,X j ) = [ar j+1 Y^ 

where n = e and r,- = (ri,Xi) . . . (rj_i,X,_i) for j > 1. Furthermore, again 
by definition of stack congruence we have <5=([cr] ri , A) = p { . Therefore, p t G 
Pop(pj_i) and thus pi G 4>=(pi~i, e). A similar but simpler argument allows us 
to reach the same conclusion for the case a = e. Finally, the definition of final 
state for 1Z= and T= makes p m a final state. Therefore the sequence p , . . . ,p m 
is an accepting path for w in T=. We have thus proved 

Proposition 2 For any CFG G and stack congruence = on the canonical LR(0) 
shift-reduce recognizer 1Z(G) ofG, L{G) C L(J 7 =(G)), where T={G) is the flat- 
tening of 1Z{G)=. 

To complete the proof of soundness for the basic algorithm, we must show 
that the stack collapsing equivalence described informally earlier is indeed a 
stack congruence. A stack r is a loop if r = (si, X\) . . . (sk, Xk) and 5(sk, Xk) = 
s\. A stack r is a minimal loop if no prefix of r is a loop. A stack that contains 
a loop is collapsible. A collapsible stack a immediately collapses to a stack a' 
if a = prv, a' = pv, r is a minimal loop and there is no other decomposition 
a = p'r'v' such that p' is a proper prefix of p and r' is a loop. By these 
definitions, a collapsible stack a immediately collapses to a unique stack C(a). 
A stack a collapses to a' if a' = C n (a). Two stacks are equivalent if they can 
be collapsed to the same uncollapsible stack. This equivalence relation is closed 
under suffixing, therefore it is a stack congruence. Each equivalence class has a 
canonical representative, the unique uncollapsible stack in it, and clearly there 
are finitely many uncollapsible stacks. 

We compute the possible uncollapsible stacks associated with states as fol- 
lows. To start with, the empty stack is associated with the initial state. Induc- 
tively, if stack a has been associated with state s and S(s, X) = s' , we associate 
a' = a(s, X) with s' unless a' is already associated with s' or s' occurs in a, in 
which case a suffix of a' would be a loop and a' thus collapsible. Since there are 
finitely many uncollapsible stacks, the above computation is must terminate. 
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When the grammar G is first decomposed into strongly connected compo- 
nents def(X), each approximated by aut(X), the soundness of the overall con- 
struction follows easily by induction on the partial order of strongly connected 
components and by the soundness of the approximation of def(X) by aut(X), 
which guarantees that each G sentential form over S U (rhs(X) — comp(A) 
accepted by def(X) is accepted by aut(X). 

3.2 Exactness 

While it is difficult to decide what should be meant by a "good" approximation, 
we observed earlier that a desirable feature of an approximation algorithm is 
that it be exact for a wide class of CFGs generating regular languages. We show 
in this section that our algorithm is exact for both left-linear and right-linear 
CFGs, and as a consequence for CFGs that can be decomposed into independent 
left and right linear components. On the other hand, a theorem of Ullian's [JlTj 
shows that there can be no partial algorithm mapping CFGs to FSAs that 
terminates on every CFG yielding a regular language L with an FSA accepting 
exactly L. 

The proofs that follow rely on the following basic definitions and facts about 
the LR(0) construction. Each LR(0) state s is the closure of a set of a certain 
set of dotted rules, its core. The closure [R] of a set R of dotted rules is the 
smallest set of dotted rules containing R that contains B — > -7 whenever it 
contains A —> a- BP and B — > 7 is in G. The core of the initial state sq contains 
just the dotted rule S' — > -S. For any other state s, there is a state s' and a 
symbol X such that s is the closure of the set core consisting of all dotted rules 
A — > aX ■ (3 where A — > a ■ X/3 belongs to s'. 

3.2.1 Left-Linear Grammars 

A CFG G is left-linear if each rule in G is of the form A — > B(3 or A — ► /3, where 
A,BeN and (3 e £*. 

Proposition 3 Let G be a left-linear CFG, and let T be the FSA derived from 
G by the basic approximation algorithm. Then L[G) = L(!F). 

Proof: By Proposition |, L(G) C L(T). Thus we need only show L{F) C L(G). 

Since M.{G) is deterministic, for each a 6 V* there is at most one state s in 
M(G) reachable from s by a path labeled with a. If s exists, we define a = s. 
Conversely, each state s can be identified with a string s ^ V* such that every 
dotted rule in s is of the form A — > s ■ a for some A 6 N and a £ V* . Clearly, 
this is true for sq — [S' —* -S], with §0 = e. The core s of any other state s will 
by construction contain only dotted rules of the form A — > a ■ (3 with a ^= e. 
Since G is left linear, (3 must be a terminal string, thus s = s. Therefore every 
dotted rule A — > a ■ (3 in s results from dotted rule A — > -a(3 in sq by a unique 
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transition path labeled by a (since A4(G) is deterministic). This means that if 
A — ► a ■ [3 and A' — > a' ■ /?' are in s, it must be the case that a = a'. 

To go from the characteristic machine A4(G) to the FSA J 7 , the algorithm 
first unfolds M{G) using the stack congruence relation, and then flattens the 
unfolded machine by replacing reduce moves with e-transitions. However, the 
above argument shows that the only stack possible at a state s is the one corre- 
sponding to the transitions given by s, and thus there is a single stack congruence 
state at each state. Therefore, M(G) will only be flattened, not unfolded. Hence 
the transition function (f> for the resulting flattened automaton T is defined as 
follows, where a G AS* U £*, a e S, and A e N: 

(a) <p(a, a) = {cm} 

(b) 4>{a, e) = {A | A -► a e G} 

The start state of J 7 is e. The only final state is S. 

We will establish the connection between T derivations and G derivations. 
We claim that if there is a path from a to S labeled by w then either there is a 
rule A — > a such that w — xy and S => Ay =>- axy, or a — S and w — e. The 
claim is proved by induction on \w\. 

For the base case, suppose \w\ — and there is a path from a to 5 labeled 
by w. Then w = e, and either a = 5, or there is a path of e-transitions from a 
to S. In the latter case, S => A => e for some Ae JV and rule A — > e, and thus 
the claim holds. 

Now, assume that the claim is true for all \w\ < k, and suppose there is a 
path from a to 5 labeled w' , for some = k. Then w' = aw for some terminal 
a and |io| < k, and there is a path from cm to 5 labeled by w. By the induction 
hypothesis, S => Ay => aax'y, where A — > aas' is a rule and x'y = w (since 
aa 7^ 5). Letting a; = as', we have the desired result. 

If w G L(T), then there is a path from e to S labeled by w. Thus, by the 
claim just proved, S => Ay =>• xy, where A — > a; is a rule and w = xy (since 

S). Therefore, S 4- w, so w 6 L(G), as desired. □ 

3.2.2 Right-Linear Grammars 

A CFG G is right linear if each rule in G is of the form A — > or ^4 — > /3, 
where A,B € N and /3 e S*. 

Proposition 4 Let G be a right-linear CFG and T be the FSA derived from G 
by the basic approximation algorithm. Then L(G) = L{!F). 

Proof: As before, we need only show L(T) C L(G). 

Let 1Z be the shift-reduce recognizer for G. The key fact to notice is that, 
because G is right-linear, no shift transition may follow a reduce transition. 
Therefore, no terminal transition in T may follow an e-transition, and after 
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any e-transition, there is a sequence of e-transitions leading to the final state 
[S' — > 5-]. Hence T has the following kinds of states: the start state, the final 
state, states with terminal transitions entering and leaving them (we call these 
reading states), states with e-transitions entering and leaving them (prefinal 
states), and states with terminal transitions entering them and e-transitions 
leaving them (crossover states). Any accepting path through T will consist of 
a sequence of a start state, reading states, a crossover state, prefinal states, and 
a final state. The exception to this is a path accepting the empty string, which 
has a start state, possibly some prefinal states, and a final state. 

The above argument also shows that unfolding docs not change the set of 
strings accepted by T, because any reduction in 1Z= (or e-transition in J 7 ), is 
guaranteed to be part of a path of reductions (e-transitions) leading to a final 
state of 1Z= (J 7 ). 

Suppose now that w = w\ . . .w n is accepted by T . Then there is a path 
from the start state So through reading states si, . . . , s„_i, to crossover state 
s n , followed by e-transitions to the final state. We claim that if there there is a 
path from Sj to s n labeled ti>j+i . . . w n , then there is a dotted rule A — ► x ■ yB 
in Si such B => z and yz = w i+ \ ...w n , where A e N,B e N WE*,y,z e £*, 
and one of the following holds: 

(a) a; is a nonempty suffix of Wi . . . Wi , 

(b) x = e, A" 4> A, A' — > x 1 ■ A" is a dotted rule in Sj, and x' is a nonempty 
suffix of W\ . . . Wi , or 

(c) x — e, Si — so, and S => A. 

We prove the claim by induction onn-i For the base case, suppose there 
is an empty path from s n to s n . Because s n is the crossover state, there must be 
some dotted rule A — > x- in s n . Letting y — z = B — e, we get that A — > x ■ yB 
is a dotted rule of s n and B = z. The dotted rule A — > x ■ yB must have either 
been added to s n by closure or by shifts. If it arose from a shift, x must be a 
nonempty suffix of w\ .. .w n . If the dotted rule arose by closure, x = e, and 
there is some dotted rule A' — ► x' ■ A" such that A" =^ A and x' is a nonempty 
suffix of Wi . . . w n . 

Now suppose that the claim holds for paths from Sj to s n , and look at a path 
labeled w% . . . w n from Si-i to s n . By the induction hypothesis, A — > x ■ yB is 
a dotted rule of Sj, where B =5> z, uz — Wi+i . . . w n , and (since Si ^ sq), either 
x is a nonempty suffix of w\ . . . Wi or x = e, A' — > x' ■ A" is a dotted rule of Sj, 
A" A, and x' is a nonempty suffix of w\ . . . u>i. 

In the former case, when x is a nonempty suffix of w\ . . . Wi , then x = 
Wj . . . Wi for some 1 < j < i. Then A —> Wj . . .Wi ■ yB is a dotted rule of Si, 
and thus A —* Wj . . . w^-i • Wjt/B is a dotted rule of Sj_i. If j < i — 1, then 

. . . Wi-i is a nonempty suffix of w\ . . . and we are done. Otherwise, 

. . . = e, and so A — > -WiyB is a dotted rule of Sj_i. Let y' = w^y. Then 
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Symbol 


Category 


Features 


s 


sentence 


n (number), p (person) 


np 


noun phrase 


n, p, c (case) 


vp 


verb phrase 


n, p, t (verb type) 


args 


verb arguments 


t 


det 


determiner 


n 


n 


noun 


n 


pron 


pronoun 


n, p, c 


V 


verb 


n, p, t 



Table 1 : Categories of Example Grammar 

A — > -y'B is a dotted rule of Sj_i, which must have been added by closure. 
Hence there are nonterminals A' and A" such that A" =4- A and A' — > x' ■ A" is 
a dotted rule of Sj_i, where x' is a nonempty suffix of w\ . . . Wi-i. 

In the latter case, there must be a dotted rule A 1 — ► Wj . ■ . • W{A" in 
Sj_l. The rest of the conditions are exactly as in the previous case. 

Thus, if w = Wi . . . W n is accepted by T, then there is a path from sq to s n 
labeled by w\ . . . w n . Hence, by the claim just proved, A — > x ■ yB is a dotted 
rule of s n , and B 2, where yz = Wi . . ■ w n — w. Because the Sj in the claim 
is so, and all the dotted rules of Sj can have nothing before the dot, and x 
must be the empty string. Therefore, the only possible case is case 3. Thus, 
S => A — > yz — w, and hence w £ L(G). The proof that the empty string is 
accepted by T only if it is in L(G) is similar to the proof of the claim. □ 

3.3 Decompositions 

If each def(X) in the strongly-connected component decomposition of G is left- 
linear or right-linear, it is easy to see that G accepts a regular language, and 
that the overall approximation derived by decomposition is exact. Since some 
components may be left-linear and others right-linear, the overall class we can 
approximate exactly goes beyond purely left-linear or purely right-linear gram- 
mars. 

4 Implementation and Example 

The example in the appendix is an APSG for a small fragment of English, 
written in the notation accepted by our grammar compiler. The categories and 
features used in the grammar are described in Tables ^ and |^ (categories without 
features are omitted). The example grammar accepts sentences such as 

i give a cake to torn 
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Feature 


Values 


n (number) 
p (person) 
c (case) 
t (verb type) 


s (singular), p (plural) 

1 (first), 2 (second), 3 (third) 

s (subject), o (nonsubject) 

i (intransitive), t (transitive), d 

(ditransitive) 



Table 2: Features of Example Grammar 



torn sleeps 

i eat every nice cake 

but rejects ill-formed inputs such as 

i sleeps 
i eats a cake 
i give 
torn eat 

It is easy to see that the each strongly-connected component of the example 
is either left-linear or right linear, and therefore our algorithm will produce an 
equivalent FSA. Grammar compilation is organized as follows: 

1. Instantiate input APSG to yield an equivalent CFG. 

2. Decompose the CFG into strongly-connected components. 

3. For each subgrammar def(X) in the decomposition: 

(a) approximate def(X) by aut(X); 

(b) determinize and minimize aut(A); 

4. Recombine the aut(A) into a single FSA using the partial order of gram- 
mar components. 

5. Determinize and minimize the recombined FSA. 

For small examples such as the present one, steps 2, 3 and 4 can be replaced by 
a single approximation step for the whole CFG. In the current implementation, 
instantiation of the APSG into an equivalent CFG is written in Prolog, and the 
other compilation steps are written in C, for space and time efficiency in dealing 
with potentially large grammars and automata. 

For the example grammar, the equivalent CFG has 78 nonterminals and 157 
rules, the unfolded and flattened FSA 2615 states and 4096 transitions, and 
the determinized and minimized final DFA shown in Figure R has 16 states and 
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Figure 7: Approximation for Example Grammar 
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97 transitions. The runtime for the whole process is 1.78 seconds on a Sun 
SparcStation 20. 

Substantially larger grammars, with thousands of instantiated rules, have 
been developed for a speech-to-speech translation project [ [l4| . Compilation 
times vary widely, but very long compilations appear to be caused by a com- 
binatorial explosion in the unfolding of right recursions that will be discussed 
further in the next section. 

5 Informal Analysis 

In addition to the cases of left- linear and right- linear grammars and decomposi- 
tions into those cases discussed in Section |^, our algorithm is exact in a variety 
of interesting cases, including the examples of Church and Patil ||, which il- 
lustrate how typical attachment ambiguities arise as structural ambiguities on 
regular string sets. 

The algorithm is also exact for some self-embedding grammars^ of regular 
languages, such as 

S — > aS | Sb | c 

defining the regular language a*cb*. 

A more interesting example is the following simplified grammar for the struc- 
ture of English noun phrases: 

NP -> Det Nom | PN 

Det -> Art | NP 's 

Nom -> N | Nom PP | Adj Nom 

PP -> P NP 

The symbols Art, Adj, N, PN and P correspond to the parts of speech article, 
adjective, noun, proper noun and preposition, and the nonterminals Det, NP, 
Nom and PP to determiner phrases, noun phrases, nominal phrases and prepo- 
sitional phrases, respectively. From this grammar, the algorithm derives the 
exact DFA in Figure |^. This example is typical of the kinds of grammars with 
systematic attachment ambiguities discussed by Church and Patil ||. A string 
of parts-of-speech such as 

Art N P Art N P Art N 

is ambiguous according to the grammar (only some constituents shown for sim- 
plicity): 

Art [ Nom N [p P P[ NP Art [ Nom N [ PP P [ NP Art N]]]]]] 
Art [ N om [NomN [ P P P [ NP Art N]]] [ PP P [ NP Art N]]] 

3 A grammar is self-embedding if and only if licenses the derivation X =>- aX(3 for nonempty 
a and [3. A language is regular if and only if it can be described by some non-self-embedding 
grammar. 
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Figure 8: Acceptor for Noun Phrases 



However, if multiplicity of analyses are ignored, the string set accepted by the 
grammar is regular and the approximation algorithm obtains the correct DFA. 
However, we have no characterization of the class of CFGs for which this kind 
of exact approximation is possible. 

As an example of inexact approximation, consider the self-embedding CFG 

S->aSb\e 

for the nonregular language a n b n ,n > 0. This grammar is mapped by the 
algorithm into an FSA accepting e | a + b + . The effect of the algorithm is thus to 
"forget" the pairing between o's and 6's mediated by the stack of the grammar's 
characteristic recognizer. 

Our algorithm has very poor worst-case performance. First, the expansion 
of an APSG into a CFG, not described here, can lead to an exponential blow-up 
in the number of nonterminals and rules. Second, the subset calculation implicit 
in the LR(0) construction can make the number of states in the characteristic 
machine exponential on the number of CF rules. Finally, unfolding can yield 
another exponential blow-up in the number of states. 

However, in the practical examples we have considered, the first and the last 
problems appear to be the most serious. 

The rule instantiation problem may be alleviated by avoiding full instantia- 
tion of unification grammar rules with respect to "don't care" features, that is, 
features that are not constrained by the rule. 

The unfolding problem is particularly serious in grammars with subgram- 
mars of the form 

S -> XxS | • • • | X n S | Y . (1) 

It is easy to see that the number of unfolded states in the subgrammar is expo- 
nential in n. This kind of situation often arises indirectly in the expansion of 
an APSG when some features in the right-hand side of a rule are unconstrained 
and thus lead to many different instantiated rules. However, from the proof 
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of Proposition |] it follows immediately that unfolding is unnecessary for right- 
linear grammars. Therefore, if we use our grammar decomposition method first 
and test individual components for right-linearity, unnecessary unfolding can be 
avoided. Alternatively, the problem can be circumvented by left factoring (|lj) 
as follows: 

S -> ZS | Y 
Z^X l \---\X n 



6 Related Work and Conclusions 

Our work can be seen as an algorithmic realization of suggestions of Church and 
Patil [M H on algebraic simplifications of CFGs of regular languages. Other work 
on finite state approximations of phrase structure grammars has typically relied 
on arbitrary depth cutoffs in rule application. While this may be reasonable for 
psycholinguistic modeling of performance restrictions on center embedding ]l2| , 
it does not seem appropriate for speech recognition where the approximating 
FSA is intended to work as a filter and not reject inputs acceptable by the given 
grammar. For instance, depth cutoffs in the method described by Black lead 
to approximating FSAs whose language is neither a subset nor a superset of 
the language of the given phrase-structure grammar. In contrast, our method 
will produce an exact FSA for many interesting grammars generating regular 
languages, such as those arising from systematic attachment ambiguities ||. 
It is important to note, however, that even when the result FSA accepts the 
same language, the original grammar is still necessary because interpretation 
algorithms are generally expressed in terms of phrase structures described by 
that grammar, not in terms of the states of the FSA. 

Several extensions of the present work may be worth investigating. 

As is well known, speech recognition accuracy can often be improved by tak- 
ing into account the probabilities of different sentences. If such probabilities are 
encoded as rule probabilities in the initial grammar, we would need a method 
for transferring them to the approximating FSA. Alternatively, transition prob- 
abilities for the approximating FSA could be estimated directly from a training 
corpus, either by simple counting in the case of a DFA or by an appropriate 
version of the Baum- Welch procedure for general probabilistic FSAs 

Alternative pushdown acceptors and stack congruences may be considered 
with different size-accuracy tradeoffs. Furthermore, instead of expanding the 
APSG first into a CFG and only then approximating, one might start with a 
pushdown acceptor for the APSG class under consideration (hJ, and approxi- 
mate it directly using a generalized notion of stack congruence that takes into 
account the instantiation of stack items. This approach might well reduce the 
explosion in grammar size induced by the initial conversion of APSGs to CFGs, 
and also make the method applicable to APSGs with unbounded feature sets, 
such as general constraint-based grammars. 
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We do not have any useful quantitative measure of approximation quality. 
Formal-language theoretic notions such as the rational index of a language || 
capture a notion of language complexity but it is not clear how it relates to the 
intuition that an approximation is "worse" than another if it strictly contains 
it. In a probabilistic setting, a language can be identified with a probability 
density function over strings. Then the Kullback-Leibler divergence || between 
the approximation and the original language might be a useful measure of ap- 
proximation quality. 

Finally, constructions based on finite-state transducers may lead to a whole 
new class of approximations. For instance, CFGs may be decomposed into the 
composition of a simple fixed CFG with given approximation and a complex, 
varying finite-state transducer that needs no approximation. 
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Appendix — APSG Formalism and Example 



Nonterminal symbols (syntactic categories) may have features that specify vari- 
ants of the category (eg. singular or plural noun phrases, intransitive or transi- 
tive verbs). A category cat with feature constraints is written 

cat# la,..., c m ] . 

Feature constraints for feature / have one of the forms 

/ = v (2) 
/ = c (3) 
/ = (ci,...,c„) (4) 

where v is a variable name (which must be capitalized) and c, c\ , . . . , c n are 
feature values. 

All occurrences of a variable v in a rule stand for the same unspecified value. 
A constraint with form (^) specifies a feature as having that value. A constraint 
of form (||) specifies an actual value for a feature, and a constraint of form (||) 
specifies that a feature may have any value from the specified set of values. The 
symbol "!" appearing as the value of a feature in the right-hand side of a rule 
indicates that that feature must have the same value as the feature of the same 
name of the category in the left-hand side of the rule. This notation, as well 
as variables, can be used to enforce feature agreement between categories in a 
rule, for instance, number agreement between subject and verb. 

It is convenient to declare the features and possible values of categories with 
category declarations appearing before the grammar rules. Category declara- 
tions have the form 

cat cat#\_ fi = (un, . . . ,vi kl ), 
. . . , 

fm ~ f^mb • • • ) ^rakm ^ I ' 

giving all the possible values of all the features for the category. 
The declaration 

start cat. 

declares cat as the start symbol of the grammar. 

In the grammar rules, the symbol '"" prefixes terminal symbols, commas 
are used for sequencing and " I " for alternation. 



start s . 



cat s#[n=(s,p) ,p=(l,2,3)] . 
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cat np#[n=(s,p) ,p=(l,2,3) ,c=(s,o)] . 

cat vp#[n=(s,p) ,p=(l,2,3) ,type=(i,t,d)] . 

cat args# [type=(i ,t ,d)] . 

cat det#[n=(s,p)] . 
cat n#[n=(s,p)] . 

cat pron#[n=(s,p) ,p=(l,2,3) ,c=(s,o)] . 
cat v#[n=(s,p) ,p=(l,2,3) ,type=(i,t,d)] . 

s => np# [n= ! ,p= ! , c=s] , vp# [n= ! ,p= ! ] . 

np# [p=3] => det# [n= ! ] , adj s , n# [n= ! ] . 

np#[n=s,p=3] => pn. 

np => pron#[n=!, p= ! , c=!]. 

pron# [n=s ,p=l , c=s] => 'i. 
pron#[p=2] => 'you. 

pron# [n=s ,p=3 , c=s] => 'he I 'she. 
pron# [n=s ,p=3] => 'it. 

pron# [n=p,p=l , c=s] => 'we. 

pron#[n=p,p=3,c=s] => 'they. 

pron# [n=s ,p=l , c=o] => 'me. 

pron#[n=s,p=3,c=o] => 'him I 'her. 

pron# [n=p,p=l , c=o] => 'us. 

pron#[n=p,p=3,c=o] => 'them. 

vp => v#[n=! ,p=! ,type=!] , args# [type= ! ] . 

adj s => [] . 

adj s => adj , adj s . 

args# [type=i] => [] . 

args# [type=t] => np# [c=o] . 

args# [type=d] => np#[c=o], 'to, np#[c=o]. 

pn => 'torn I 'dick I 'harry. 

det => 'some I 'the. 
det#[n=s] => 'every I 'a. 
det#[n=p] => 'all I 'most. 

n# [n=s] => ' child I ' cake . 
n#[n=p] => 'children I 'cakes. 
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adj => 'nice I 'sweet. 

v#[n=s,p=3,type=i] => 'sleeps. 
v# [n=p,type=i] => 'sleep. 
v#[n=s,p=(l,2) ,type=i] => 'sleep. 

v#[n=s,p=3,type=t] => 'eats. 
v# [n=p,type=t] => 'eat. 
v#[n=s,p=(l,2) ,type=t] => 'eat. 

v#[n=s,p=3,type=d] => 'gives. 
v# [n=p,type=d] => 'give. 
v#[n=s,p=(l,2) ,type=d] => 'give. 
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